Search CORE

13 research outputs found

Misclassification analysis for the class imbalance problem

Author: Fung C.C.
Jeatrakul P.
Takama Y.
Wong K.W.
Publication venue: 'TSI Press'
Publication date: 01/01/2010
Field of study

In classification, the class imbalance issue normally causes the learning algorithm to be dominated by the majority classes and the features of the minority classes are sometimes ignored. This will indirectly affect how human visualise the data. Therefore, special care is needed to take care of the learning algorithm in order to enhance the accuracy for the minority classes. In this study, the use of misclassification analysis is investigated for data re-distribution. Several under-sampling techniques and hybrid techniques using misclassification analysis are proposed in the paper. The benchmark data sets obtained from the University of California Irvine (UCI) machine learning repository are used to investigate the performance of the proposed techniques. The results show that the proposed hybrid technique presents the best performance in the experiment

Research Repository

Deep Over-sampling Framework for Classifying Imbalanced Data

Author: B Krawczyk
C Dong
G Hinton
GE Hinton
H He
KQ Weinberger
MD Zeiler
NV Chawla
NV Chawla
P Jeatrakul
RA Dunne
S Ando
S Köknar-Tezel
Y Bengio
Y Lecun
ZH Zhou
Publication venue
Publication date: 12/07/2017
Field of study

Class imbalance is a challenging issue in practical classification problems for deep learning models as well as traditional models. Traditionally successful countermeasures such as synthetic over-sampling have had limited success with complex, structured data handled by deep learning models. In this paper, we propose Deep Over-sampling (DOS), a framework for extending the synthetic over-sampling method to exploit the deep feature space acquired by a convolutional neural network (CNN). Its key feature is an explicit, supervised representation learning, for which the training data presents each raw input sample with a synthetic embedding target in the deep feature space, which is sampled from the linear subspace of in-class neighbors. We implement an iterative process of training the CNN and updating the targets, which induces smaller in-class variance among the embeddings, to increase the discriminative power of the deep representation. We present an empirical study using public benchmarks, which shows that the DOS framework not only counteracts class imbalance better than the existing method, but also improves the performance of the CNN in the standard, balanced settings

arXiv.org e-Print Archive

Crossref

A novel feature selection-based sequential ensemble learning method for class noise detection in high-dimensional data

Author: ALB Miranda
B Frénay
CE Brodley
CM Teng
DL Wilson
DR Wilson
I Guyon
J Thongkam
JA Sáez
JS Sánchez
P Jeatrakul
TM Khoshgoftaar
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

© 2018, Springer Nature Switzerland AG. Most of the irrelevant or noise features in high-dimensional data present significant challenges to high-dimensional mislabeled instances detection methods based on feature selection. Traditional methods often perform the two dependent step: The first step, searching for the relevant subspace, and the second step, using the feature subspace which obtained in the previous step training model. However, Feature subspace that are not related to noise scores and influence detection performance. In this paper, we propose a novel sequential ensemble method SENF that aggregate the above two phases, our method learns the sequential ensembles to obtain refine feature subspace and improve detection accuracy by iterative sparse modeling with noise scores as the regression target attribute. Through extensive experiments on 8 real-world high-dimensional datasets from the UCI machine learning repository [3], we show that SENF performs significantly better or at least similar to the individual baselines as well as the existing state-of-the-art label noise detection method

ZU Scholars (Zayed University)

Crossref

Learning From Multiple Experts:Self-paced Knowledge Distillation for Long-Tailed Classification

Author: B Zhou
E Stamatatos
G Ding
H Han
H He
MA Tahir
NV Chawla
P Jeatrakul
S Guo
SH Khan
T-Y Lin
Y Guo
Y-X Wang
Z Li
Publication venue: Springer Nature
Publication date: 20/09/2020
Field of study

In real-world scenarios, data tends to exhibit a long-tailed distribution, which increases the difficulty of training deep networks. In this paper, we propose a novel self-paced knowledge distillation framework, termed Learning From Multiple Experts (LFME). Our method is inspired by the observation that networks trained on less imbalanced subsets of the distribution often yield better performances than their jointly-trained counterparts. We refer to these models as 'Experts', and the proposed LFME framework aggregates the knowledge from multiple 'Experts' to learn a unified student model. Specifically, the proposed framework involves two levels of adaptive learning schedules: Self-paced Expert Selection and Curriculum Instance Selection, so that the knowledge is adaptively transferred to the 'Student'. We conduct extensive experiments and demonstrate that our method is able to achieve superior performances compared to state-of-the-art methods. We also show that our method can be easily plugged into state-of-the-art long-tailed classification algorithms for further improvements.Comment: ECCV 2020 Spotligh

arXiv.org e-Print Archive

Crossref

Aberystwyth Research Portal

Data cleaning for classification using misclassification analysis

Author: Fung C.C.
Jeatrakul P.
Wong K.W.
Publication venue: Fuji Technology Press Co. Ltd.
Publication date: 01/01/2009
Field of study

In most classification problems, sometimes in order to achieve better results, data cleaning is used as a preprocessing technique. The purpose of data cleaning is to remove noise, inconsistent data and errors in the training data. This should enable the use of a better and representative data set to develop a reliable classification model. In most classification models, unclean data could sometime affect the classification accuracies of a model. In this paper, we investigate the use of misclassification analysis for data cleaning. In order to demonstrate our concept, we have used Artificial Neural Network (ANN) as the core computational intelligence technique. We use four benchmark data sets obtained from the University of California Irvine (UCI) machine learning repository to investigate the results from our proposed data cleaning technique. The experimental data sets used in our experiment are binary classification problems, which are German credit data, BUPA liver disorders, Johns Hopkins Ionosphere and Pima Indians Diabetes. The results show that the proposed cleaning technique could be a good alternative to provide some confidence when constructing a classification model

Research Repository

Classification of imbalanced data by combining the complementary neural network and SMOTE algorithm

Author: Fung C.C.
Jeatrakul P.
Wong K.W.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

In classification, when the distribution of the training data among classes is uneven, the learning algorithm is generally dominated by the feature of the majority classes. The features in the minority classes are normally difficult to be fully recognized. In this paper, a method is proposed to enhance the classification accuracy for the minority classes. The proposed method combines Synthetic Minority Over-sampling Technique (SMOTE) and Complementary Neural Network (CMTNN) to handle the problem of classifying imbalanced data. In order to demonstrate that the proposed technique can assist classification of imbalanced data, several classification algorithms have been used. They are Artificial Neural Network (ANN), k-Nearest Neighbor (k-NN) and Support Vector Machine (SVM). The benchmark data sets with various ratios between the minority class and the majority class are obtained from the University of California Irvine (UCI) machine learning repository. The results show that the proposed combination techniques can improve the performance for the class imbalance problem

Research Repository

Comparing the performance of different neural networks for binary classification problems

Author: Jeatrakul P.
Wong K.W.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

Classification problem is a decision making task where many researchers have been working on. There are a number of techniques proposed to perform classification. Neural network is one of the artificial intelligent techniques that has many successful examples when applying to this problem. This paper presents a comparison of neural network techniques for binary classification problems. The classification performance obtained by five different types of neural networks for comparison are Back Propagation Neural Network (BPNN), Radial Basis Function Neural Network (RBFNN), General Regression Neural Network (GRNN), Probabilistic Neural Network (PNN), and Complementary Neural Network (CMTNN). The comparison is done based on three benchmark data sets obtained from UCI machine learning repository. The results show that CMTNN typically provide better classification results when comparing to techniques applied to binary classification problems

Crossref

Research Repository

Multiclass Imbalanced Classification Using Fuzzy C-Mean and SMOTE with Fuzzy Support Vector Machine

Author: A Fernández
C Dumitru
C Jian
G Ou
L Abdi
M Rahman
N Chawla
P Jeatrakul
R Batuwita
R Pruengkarn
S Wang
V López
WC Lin
Publication venue: Springer Verlag
Publication date: 01/01/2017
Field of study

A hybrid sampling technique is proposed by combining Fuzzy C-Mean Clustering and Synthetic Minority Oversampling Technique (FCMSMT) for tackling the imbalanced multiclass classification problem. The mean number of classes is used as the number of instances for applying undersampling and oversampling. Using the mean as the fixed number of the required instances for each class can prevent the within-class imbalance data from being eliminated erroneously during undersampling. This technique can decrease both within-class and between-class errors, and thus can increase the classification performance. The study was conducted using eight benchmark datasets from KEEL and UCI repositories and the results were compared against three major classifiers based on G-mean and AUC measurements. The results reveal that the proposed technique could handle most of the multiclass imbalanced datasets used in the experiments for all classifiers and retain the integrity of the original data

Crossref

Research Repository

Learning From Multiple Experts:Self-paced Knowledge Distillation for Long-Tailed Classification

Author: B Zhou
E Stamatatos
G Ding
H Han
H He
MA Tahir
NV Chawla
P Jeatrakul
S Guo
SH Khan
T-Y Lin
Y Guo
Y-X Wang
Z Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/10/2020
Field of study

Crossref

Aberystwyth Research Portal

Robust Class-Specific Autoencoder for Data Cleaning and Classification in the Presence of Label Noise

Author: B Biggio
B Frénay
CE Brodley
DF Nettleton
DM Hawkins
Dong Wang
EDL Hoz
J Maria
JA Sáez
K Gupta
P Jeatrakul
P Vincent
PJ Huber
R Ekambaram
R Kamimura
R Lab
R Vidal
T Liu
V Chandola
W Zhang
Weining Zhang
Xiaoyang Tan
Y LeCun
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref